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Fairness or unfairness lay be an attrirbute cf a vtest 
p^r;se, or of ;its use, or of its statisticiil treatient. IlA 
hypothetical situation designed to be intrinsically f/ir and unbiased 
is used to shov that analysis cf covariance as a statistical aethod ^ 
■ay introduce bias tp the tre^atneqt of test scores. Jn contrast, 
equipercentile equating leth^ds are shovel in this si)tuaticn, to 
result in a fair and unbiased treatment btv test sco-^s. A graphic 
figUrl illustrates the cdaparison of the tiiX) different (liethcds of 
inaJ^sis. (Author) 



V ****** ******** 

* ^ Q;ocu»ents acquired ty ^BIC include lan 

* aatetials not available fAm other sources 

* to obtain the best -copy available. Heverth 

* reproducibility, are often encountered and 

* of the liicrof iche and hardcopy reproductio 

* via the^ EBIC Docaaent Beprodoction serWce 

* responsible for the quality of the origitoa 

* supplied bjr EOBS are the best that can be. 



♦ ♦♦♦♦♦♦♦ 

y icftrraal 
EBIC moJi 
eless, ite 
this affec 
is EBIC la 
(EDBS)« B 
1 dccuaent 
■ad€ froi 
♦♦♦♦♦♦♦♦♦♦ 



tt0published . * 

es every efi^rt ♦ 

■s of larginal'^ * 

ts the quality * 

kes arailable - * 

ESS is not - * 

• Bejiroducti^ns * 

the original. * 



RB-75-12 



is' 



us QE^AIITMENT OF HEALTH. 
, EDUCfTION « WELFARE 
" NATIONAL INSTITUTE OF 
. EDUCATION 

THIS DOCUMENT. HAS BEEN REPRO' 
DUCED EXACTLY AS RtCEIVfeO FROM 
THE PERSON OR ORGANIZATION ORIGIN- 
ATIWG IT POINTS OF View OR OPINIONS 
STATED DO NOT^NECESSARtLY REPRE- 
SENT OF/ ICIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 



TEST FAIRNESS: A COMMENT ON FAIRNESS 



IN STATISTICAL ANALYSIS 



Chaa-les' T. Myers 



PERMISSION TO REPRODUCE ThTs 
MATERJAL HAS BEEN' GRANTED BY 

THE EDUCATIONAL RESOURCES 
^' INFORMATION CENTER (ERIC) AND 
USEF^S OF THE"eRIC SYSTE/yi 



This Bulletin is a. draft for interoffice circulation/ 
Correction^ and suggestions for revision are solicited.* 
The Bulletin should not.be cited as a reference without 
th^e specific permission of the^ author^ It isT automati- 
cally superseded upon formal publication of the mat^irial. 



EduHtional Testing. Service 
Princeton, New Jersey 
April 1975 



Test Fairoess 
1 

\ 



Test Fairness: , A Comment orf Fairness in St^atistical Analysis • * 

» Abstract 

An argument is presented* to suggest that the analysis of dovariance *ay in 
some circumstances be an unfair method use in the study of the question of 
test fairness. As an alternative, the' iise of equipercentile methods or^ 
equivalent linear methods may be preferred in these cirqumstances . 
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'i^' ^ Test Fairness 

Test Fairness: ' >A Comment on Fairness in Statistical Analysis 

Fairness, like beauty, may well be in the eye of the behblder. There 
is no doubt that test fairnes^ is , difficult- to define, to evaluate,' or to 
prove or disprove.- It may be a mistake to \ry to categorize a test or a 
test usage as- either, fair or biased. Instead a test or test usage should 
be evaluated as being either more or less' fair than other available alter- 
natives. Fairness in decision making, in an ;ab solute ' sense ,^may be an ' 
impossible ideal. But in spite of all these difficulties and ambiguities, 
the maker and the user of tests is obligated to maintain the highest pos- - 
sible standard of fairness. There is also an oy.igation to clarify the 

♦ 

meaning of the concepts of test fairness. , . - ' 

» • 

There .have been a number of different and even incompatible defini- ' 
tions by such persons as Tborn'dike (1971), Darlingt^ (1^71), and Cole 
(Note 1) of what is meant by 'fairness"; or^ conver^^V what is meant by bias 
in test scores. A (distinction has been made by Flaug'her (Note 2) between 
a biased test and. the biased. use of a fair test.. This paper is an attempt 
to present a rationale fox a fair analysis for determining whether a test ~ . 
is biased. What we intend to do is to describe data froA a situation 
that appear^ to be intrinsically fair; and then we will compare two differ- 
ent, statistical techniques for analyses of fhose data. We expect to show 

n , ■ , - , 

that the traditional technique may in some cases be .intrinsically unfair ' 

' », . ^ 

and that the other technique may sometimes be preferable. 
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.In a now f amous-^stud^' of test bias^ Cleary (1968) said; "A test is 
.biased for members "a .subgroup of , th'e pc^puiat'lon if,, in the prediction 
df a critenion foi^it:h th'fe test^was designed, consistent nonzero errors 
. 9& predicti*^n>r^ made for members oV the* subgroup." Jn 'this stGdy she 
.used 'the traditional \agi:ession method of analysis pf covariance. 

Howevex^ itvjnay be instructive' to consider a situation that t:an'be 
assumed to /be fair -and then consider what would h^pperi "if we applied the 
analysis hk covariance to data from that situation. ^ Let us imagine a 
^verbal aiftltude test designed fpr use in fifth, sixth, and seventh girades 
and a parallel form of that test. Let us call>hese two tests Text X and 
Test Y. Let us assume that these 'tests are similar in content and. in the 
quality and difficulty of t^he items of which, they are made up and that, the 
test is equally appropriate for use in all three grades\ Further, let us 
assume that Test X and^^st Y have been carefully constructed so that any 
numerical score on Test'Y is equivalent in meaning to the same numerical • 
score on Test X. Under these circumstances, it seems reasonable to' suppose 
that Text X is a fair test for predicting scores on> Test Y. 

Let us "for convenience imagine that Tests X and Y have scores that 
rang« from 0 to 100 and that for either test the mean scdre for grade- 
seven children 'was 70, the mean score 'for' grade-six children was 50, and 
the mean score for' grade-five ^children was. 30. Further imagine thkt the 
standard deviation of each within-grade distrlbdtieri was 15, for each grade 
and for each test.'. Finally let us assume that the within-grade correlation 
between the two tests for each grade was 0.~Xo. Obviously we are imagining 
h^jpothetical data simplified for the purpose of presentlng^a theoretical 
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position, but these hypothetical values are unrealistic because they are so 
regula-r; not .because they are outside the normal range *of common experience.' 

Given these conditions, .consider what would happen if we ai^lied 
analysis of -covari^ce' to the "question : In comparison with its use for ' ' 
. sev^th graders,. is Test X a fair test for fifth grad^ers for the purpjjse 
of predicting scores on Test Y? Notice that we have supplied information ' 
^to suggest that Test X and Test Y are identical in all the comparisons we 
have madfe and it seems that, we may say intuitively that Test x' is fair for * 
that use. However, according to analysis of covariance, ^ade seven and 
^ grade five would not have identical regression, lines. The regression lines 
would.be parall^, but their Y intercepts would be different. Grade seven 
would have a Y intercept of 14 while grade five w<3^d have a Y intercut 
of 6, giving a difference of "8 points on j^fe Y^cale. The same difference- 
would be found at other score levels. A f if th-^rader>ith a score of 50 ' 
^ woaid have a predicted ? score.vof, Aff', "but a seven th-gradfi^ with an X score 
■ of, 50 would have ^redicfed^Y ^core of 54. Accx^rding to. analys^is of co- 
virlance^ used by eiea^y and, several others, 'lest ^ay be considered , 
glased against f itth-graders. • ' • 

If a situation which was des'igned By"def inition to be fa^r is shown ' , 

» # 

,by. analysis, of covariance to be unfair, .this suggests that perhapg •ahalysl/s• 
" . ♦ ' • , ■ . ' ■ > • /. 
of ^(^ariance.is inappropriate as a technique for. studyipg this question. 

To m^e this po^nt clearer, consider j^hat would_ happen in thi§ sij^Gatlon 

•if we used T^t Y scores to. predict Test X scores for fifth-graders. Then 

we would find that Test y' .was biased against fifth-graders in exactly the 

same(amount. We have now reachjg^- the anomalous conqlWsion that both te^ts 

• ■ ■ ^ •' • . ■ ■ 

' b, ' 



Jest Fairness 

*■ ' • , ■ ' • 5 . ■ 

• ... 

are ynfair and that both are biased in exactly the same way and amount iit. 

relation to the o.ther. " ' • « ' / 

■• ■ ' ^ . 

1 • . . ' ■ ' ' . 

As an alternative t^ analysis of cbvarianae for studying test fair- 

' ' ■ • ' • 

nessj leC us consider what , would hapjpen -if we used equating or calibrating 

.Tuethftds. Again let us disregard .sampling errors and depai:tures from lin- • \ 

earity in order to clarify tha analysis. For the klmd of s.ituation -that 

we have described, the typp of equating or calibrating most likely to be 

used would be equipercentil^ equating or the linear equivalent of setting 

^ans and standard deviations equ^. From what We. have been told about,-, 

these two tests and ^t he three grade groups, we would . normally predict that 

'{except for sampling errors) all three grades would ?how Test X as b^ilg 

equivalent to Test and therefore unbiased. The implication we draw from 

this analysis is *that calibrating procedures dre sometimes to t>e preferred. 

to anal>tsis of tfovariance -in studies of test fairness or of test bias. 

, It may be advantageous to present these concepts graphically, the 

figure below, two overlapping bivariate distributions' are shown, with the 

memhers of the higher-scoring grcfup indicated by /'s and the members* of 

/ . . . ^ 

the lower^scoring group indicated bj 0*s . • " ' 
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Y-Scores 




X-Scores 



In the figure, above, the two slanting solid lines are the two regres- 
sion lin£s 'and ithe dashed Une i,s .the equipercentlle equating line. The / 
lower of the two regression lines represents the regression equation for 
the lower-scoring group, tn thfs illustration (which is admictedly hypoj;.! , 
thetical, but may be realistic) any particular X-score would be used to, 
predict f lower Y-score for a member of the lower-scoring group than it 
lould f-pr d member of the higher-scoring group. In this particular illus--' 
tlation, the standard deviations for both groups on both variables are all 
eqlal. tjie means on both variables for the lower-scorln» group are qne ' 
^stAidard deviation lower tjian for the "other group, and the two |>i>hln-group 
, cortelations are equal to 0.50. 
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. This-discussion has not- presented a n^W concept ot t^st fatrnesfi. 
The equipetcentile- relationship^ Or an equivalent linear relationship, 
has been discussed by Lord (1967), Thorndike (1971), ^|^ingt/on (1971), 
and Myers (Note 3). *What may be nev in th±s context ^is the t:o^c6pt of 
proposin^an inherently fair situation and considering what type of * ' ^ 
analysis would be logical to use for its evaluation; that is, the sugges-* 
tion is to evaluart:e the statistical method by d^terpining whether it \. 
might be expected to give a fair and unbiased answer. . 

, There-are two implications of this approach to the question of test - 
fairness in comi)arison with the more traditional analysis of covariance 
approach. Kirst, the use of this method would terid*^, to Ibe' less likely tfo 
result i*n a decision that a test was biased against a.^'fdWer-s coring group 
than would the analysis of covariance method. Second, the us.e, of this 
method in admissions decisions would tend to result In more favorable 
decisions for the^ higher-scoring members of lowerrscoring groups. • 

^ , Although this illustration used the prediction of one test secure by 
another, the model and principles may apply directly to the situation* in 
which the Test Y of illustration is replaced by some criterion per^ 
formance 3uch as grad^-polnt-average in college or productivity on a 
job-. .:^ut it is Important to emphasize that thie model is not appropriate to* 
all 3\jch circumstance*. For example, it would -nof^be appropriate if the 
criterion Itself were biased or iTrelevant to the purpose of the' test. In • 
our. illustration the two variables were equal -in a number o| ways, §uch as* 
presumably eqi^l in reliability, that would not commonly "ocpur in a practical 
situation. The |quipercentile' njpdel is no more of! a panacea than is the 



•J 



* Test Faimfess 



analysis of covarianqe. It iff. always ^important that the assuropttons in the 
mathejaatical model shoufd not be in violation of tfi'e facts of the particu- 
lar ^situation, and whatever, model is <hoseh1nust' be appropriate to those * 

facts. X * ^ ' ^ ^ 

i 
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